Antipattern: Create Multiple Columns
Let's create multiple columns for multi-column attributes.
We'll cover the following
We still have to account for multiple values in the attribute, but we know that the new solution must store only a single value in each column. It may seem natural to create multiple columns in this table, each containing a single tag.
As we assign tags to a given bug, we’d put values in one of these three columns. Unused columns would remain null.
Let’s try to retrieve the Bugs
table data after updating the value of tag2
. Press “RUN” in the following playground and see the output.
Tasks we could do easily with a normal attribute are now more complex.
Searching for values#
When searching for bugs with a given tag, we must search all three columns because the tag string could occupy any of these columns.
If we’re searching for a given value across multiple columns, it is a clue that the multiple columns should be stored as a single logical attribute.
For example, to retrieve bugs that reference performance, we can use a query like this:
We may need to search for bugs that reference both tags, i.e., performance and printing. To do this, we use a query like the following one. It is important to use the parentheses correctly because OR
has lower precedence than AND
.
The syntax required to search for a single value over multiple columns is lengthy and tedious to write. We can make it more compact by using an IN
predicate in a slightly untraditional manner:
Adding and removing values#
Adding or removing a value from a set of columns presents its own issues. Simply using UPDATE
to change one of the columns isn’t safe since we can’t be sure which column is unoccupied, if any. We may have to retrieve the row into our application to see.
In this case, for instance, the result shows that tag2
is null. Now, we can form the UPDATE
statement.
Let’s retrieve the results after writing the query in the following playground.
We face the risk of another client going through the same steps of reading the row and updating it before we have updated the table after quering it. Depending on who applied their update first, either we or the other client risk getting an update conflict error or having their changes overwritten by the other. We can avoid this two-step query by using complex SQL expressions.
The following statement uses the NULLIF() function to make each column null if it equals a specific value. NULLIF()
returns null if its two arguments are equal.
Let’s retrieve the data after updating the table.
The following statement adds the new tag performance
to the first column that is currently null. However, if none of the three columns is null, the statement makes no change to the row, and the new tag value is not recorded. Constructing this statement is laborious. Notice also that we must repeat the string performance
six times.
Let’s retrieve the data after using UPDATE
in a query.
Ensuring uniqueness#
We probably don’t want the same value to appear in multiple columns, but the database can’t prevent this when we use the Multi-column Attributes antipattern. In other words, it’s hard to prevent the following statement:
Let’s retrieve the data in the following playground to see the results after using the INSERT
statement.
Handling growing sets of values#
Another weakness of this design is that three columns might not be enough. To keep the design of one value per column, we must define as many columns as the maximum number of tags a bug can have. How can we predict, at the time we define the table, what that greatest number will be?
One tactic is to guess at a moderate number of columns and expand later, if necessary, by adding more columns. Most databases allow us to restructure existing tables to add Bugs.tag4
or even more columns as we need them.
However, this change is costly in three ways:
-
Restructuring a database table that already contains data may require locking the entire table and blocking access for other concurrent clients.
-
Some databases implement this kind of table restructure by defining a new table to match the desired structure, copying the data from the old table, and then dropping the old table. If the table in question has a lot of data, this transfer can take a long time.
-
When we add a column in the set for a multi-column attribute, we must revisit every SQL statement in every application that uses this table, editing the statement to support new columns.
This is a meticulous and time-consuming development task. If we miss any queries that need edits, it can lead to bugs that are difficult to detect.